Search Results: "Johannes Schauer"

4 November 2015

Johannes Schauer: Let's Encrypt with Pound on Debian

TLDR: mister-muffin.de (and all its subdomains), bootstrap.debian.net and binarycontrol.debian.net are now finally signed by "Let's Encrypt Authority X1" \o/ I just tried out the letsencrypt client Debian packages prepared by Harlan Lieberman-Berg which can be found here: My server setup uses Pound as a reverse proxy in front of a number of LXC based containers running the actual services. Furthermore, letsencrypt only supports Nginx and Apache for now, so I had to manually setup things anyways. Here is how. After installing the Debian packages I built from above git repositories, I ran the following commands:
$ mkdir -p letsencrypt/etc letsencrypt/lib letsencrypt/log
$ letsencrypt certonly --authenticator manual --agree-dev-preview \
    --server https://acme-v01.api.letsencrypt.org/directory --text \
    --config-dir letsencrypt/etc --logs-dir letsencrypt/log \
    --work-dir letsencrypt/lib --email josch@mister-muffin.de \
    --domains mister-muffin.de --domains blog.mister-muffin.de \
    --domains [...]
I created the letsencrypt directory structure to be able to run letsencrypt as a normal user. Otherwise, running this command would require access to /etc/letsencrypt and others. Having to set this up and pass all these parameters is a bit bothersome but there is an upstream issue about making this easier when using the "certonly" option which in princible should not require superuser privileges. The --server option is necessary for now because "Let's Encrypt" is still in beta and one needs to register for it. Without the --server option one will get an untrusted certificate from the "happy hacker fake CA". The letsencrypt program will then ask me for my agreement to the Terms of Service and then, for each domain I specified with the --domains option present me the token content and the location under each domain where it expects to find this content, respectively. This looks like this each time:
-------------------------------------------------------------------------------
NOTE: The IP of this machine will be publicly logged as having requested this
certificate. If you're running letsencrypt in manual mode on a machine that is
not your server, please ensure you're okay with that.
Are you OK with your IP being logged?
-------------------------------------------------------------------------------
(Y)es/(N)o: Y
Make sure your web server displays the following content at
http://mister-muffin.de/.well-known/acme-challenge/XXXX before continuing:
 "header":  "alg": "RS256", "jwk":  "e": "AQAB", "kty": "RSA", "n": "YYYY" , "payload": "ZZZZ", "signature": "QQQQ" 
Content-Type header MUST be set to application/jose+json.
If you don't have HTTP server configured, you can run the following
command on the target server (as root):
mkdir -p /tmp/letsencrypt/public_html/.well-known/acme-challenge
cd /tmp/letsencrypt/public_html
echo -n ' "header":  "alg": "RS256", "jwk":  "e": "AQAB", "kty": "RSA", "n": "YYYY" , "payload": "ZZZZ", "signature": "QQQQ" ' > .well-known/acme-challenge/XXXX
# run only once per server:
$(command -v python2   command -v python2.7   command -v python2.6) -c \
"import BaseHTTPServer, SimpleHTTPServer; \
SimpleHTTPServer.SimpleHTTPRequestHandler.extensions_map =  '': 'application/jose+json' ; \
s = BaseHTTPServer.HTTPServer(('', 80), SimpleHTTPServer.SimpleHTTPRequestHandler); \
s.serve_forever()" 
Press ENTER to continue
For brevity I replaced any large base64 encoded chunks of the messages with YYYY, ZZZZ and QQQQ. The token location is abbreviated with XXXX. After temporarily stopping Pound on my webserver I created the directory /tmp/letsencrypt/public_html/.well-known/acme-challenge and then opened two shells on my server, both at /tmp/letsencrypt/public_html. In one, I kept a tiny HTTP server running (like the suggested Python SimpleHTTPServer which will also work if one has Python installed). In the other I copy pasted the echo line that the letsencrypt program suggested me to run. I had to copypaste that echo command for each domain I wanted to verify. This could easily be automated, so I filed an issue about this with upstream. It seems that the letsencrypt servers query each of these tokens twice: once directly each time after having hit enter after seeing the message above and another time once all tokens are in place. At the end of this ordeal I get:
2015-11-04 11:12:18,409:WARNING:letsencrypt.client:Non-standard path(s), might not work with crontab installed by your operating system package manager
IMPORTANT NOTES:
 - If you lose your account credentials, you can recover through
   e-mails sent to josch@mister-muffin.de.
 - Congratulations! Your certificate and chain have been saved at
   letsencrypt/etc/live/mister-muffin.de/fullchain.pem. Your cert will
   expire on 2016-02-02. To obtain a new version of the certificate in
   the future, simply run Let's Encrypt again.
 - Your account credentials have been saved in your Let's Encrypt
   configuration directory at letsencrypt/etc. You should make a
   secure backup of this folder now. This configuration directory will
   also contain certificates and private keys obtained by Let's
   Encrypt so making regular backups of this folder is ideal.
I can now scp the content of letsencrypt/etc/live/mister-muffin.de/* to my server. Unfortunately, Pound (and also my ejabberd XMPP server) requires the private key to be in the same file as the certificate and the chain, so on the server I also had to do:
cat /etc/ssl/private/privkey.pem /etc/ssl/private/fullchain.pem > /etc/ssl/private/private_fullchain.pem
And edit the Pound config to use /etc/ssl/private/private_fullchain.pem. But that's all, folks! EDIT It seems that manually copying over the echo commands as I described above is not necessary. Instead of using the certonly plugin, I can use the webroot plugin. That plugin takes the --webroot-path option and will copy the tokens to there. Since my webroot is on a remote machine, I could just mount it locally via sshfs and pass the mountpoint as --webroot-path. That I didn't realize that the webroot plugin does what I want (and not the certonly plugin) can easily be explained by the only documentation of the webroot plugin in the help output and the man page generated from it being "Webroot Authenticator" which is not very helpful. Another user seems to have run into similar problems. Better documenting the plugins so that these situations can be prevented in the future is tracked in this upstream bug.

25 October 2015

Johannes Schauer: unshare without superuser privileges

TLDR: With the help of Helmut Grohne I finally figured out most of the bits necessary to unshare everything without becoming root (though one might say that this is still cheated because the suid root tools newuidmap and newgidmap are used). I wrote a Perl script which documents how this is done in practice. This script is nearly equivalent to using the existing commands lxc-usernsexec [opts] -- unshare [opts] -- COMMAND except that these two together cannot be used to mount a new proc. Apart from this problem, this Perl script might also be useful by itself because it is architecture independent and easily inspectable for the curious mind without resorting to sources.debian.net (it is heavily documented at nearly 2 lines of comments per line of code on average). It can be retrieved here at https://gitlab.mister-muffin.de/josch/user-unshare/blob/master/user-unshare Long story: Nearly two years after my last last rant about everything needing superuser privileges in Linux, I'm still interested in techniques that let me do more things without becoming root. Helmut Grohne had told me for a while about unshare(), or user namespaces as the right way to have things like chroot without root. There are also reports of LXC containers working without root privileges but they are hard to come by. A couple of days ago I had some time again, so Helmut helped me to get through the major blockers that were so far stopping me from using unshare in a meaningful way without executing everything with sudo. My main motivation at that point was to let dpkg-buildpackage when executed by sbuild be run with an unshared network namespace and thus without network access (except for the loopback interface) because like pbuilder I wanted sbuild to enforce the rule not to access any remote resources during the build. After several evenings of investigating and doctoring at the Perl script I mentioned initially, I came to the conclusion that the only place that can unshare the network namespace without disrupting anything is schroot itself. This is because unsharing inside the chroot will fail because dpkg-buildpackage is run with non-root privileges and thus the user namespace has to be unshared. But this then will destroy all ownership information. But even if that wasn't the case, the chroot itself is unlikely to have (and also should not) tools like ip or newuidmap and newgidmap installed. Unsharing the schroot call itself also will not work. Again we first need to unshare the user namespace and then schroot will complain about wrong ownership of its configuration file /etc/schroot/schroot.conf. Luckily, when contacting Roger Leigh about this wishlist feature in bug#802849 I was told that this was already implemented in its git master \o/. So this particular problem seems to be taken care of and once the next schroot release happens, sbuild will make use of it and have unshare --net capabilities just like pbuilder already had since last year. With the sbuild case taken care of, the rest of this post will introduce the Perl script I wrote. The name user-unshare is really arbitrary. I just needed some identifier for the git repository and a filename. The most important discovery I made was, that Debian disables unprivileged user namespaces by default with the patch add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch to the Linux kernel. To enable it, one has to first either do
echo 1   sudo tee /proc/sys/kernel/unprivileged_userns_clone > /dev/null
or
sudo sysctl -w kernel.unprivileged_userns_clone=1
The tool tries to be like unshare(1) but with the power of lxc-usernsexec(1) to map more than one id into the new user namespace by using the programs newgidmap and newuidmap. Or in other words: This tool tries to be like lxc-usernsexec(1) but with the power of unshare(1) to unshare more than just the user and mount namespaces. It is nearly equal to calling:
lxc-usernsexec [opts] -- unshare [opts] -- COMMAND
Its main reason of existence are: I hoped that systemd-nspawn could do what I wanted but it seems that its requirement for being run as root will not change any time soon Another tool in Debian that offers to do chroot without superuser privileges is linux-user-chroot but that one cheats by being suid root. Had I found lxc-usernsexec earlier I would've probably not written this. But after I found it I happily used it to get an even better understanding of the matter and further improve the comments in my code. I started writing my own tool in Perl because that's the language sbuild was written in and as mentioned initially, I intended to use this script with sbuild. Now that the sbuild problem is taken care of, this is not so important anymore but I like if I can read the code of simple programs I run directly from /usr/bin without having to retrieve the source code first or use sources.debian.net. The only thing I wasn't able to figure out is how to properly mount proc into my new mount namespace. I found a workaround that works by first mounting a new proc to /proc and then bind-mounting /proc to whatever new location for proc is requested. I didn't figure out how to do this without mounting to /proc first partly also because this doesn't work at all when using lxc-usernsexec and unshare together. In this respect, this perl script is a bit more powerful than those two tools together. I suppose that the reason is that unshare wasn't written with having being called without superuser privileges in mind. If you have an idea what could be wrong, the code has a big FIXME about this issue. Finally, here a demonstration of what my script can do. Because of the /proc bug, lxc-usernsexec and unshare together are not able to do this but it might also be that I'm just not using these tools in the right way. The following will give you an interactive shell in an environment created from one of my sbuild chroot tarballs:
$ mkdir -p /tmp/buildroot/proc
$ ./user-unshare --mount-proc=/tmp/buildroot/proc --ipc --pid --net \
    --uts --mount --fork -- sh -c 'ip link set lo up && ip addr && \
    hostname hoothoot-chroot && \
    tar -C /tmp/buildroot -xf /srv/chroot/unstable-amd64.tar.gz; \
    /usr/sbin/chroot /tmp/buildroot /sbin/runuser -s /bin/bash - josch && \
    umount /tmp/buildroot/proc && rm -rf /tmp/buildroot'
(unstable-amd64-sbuild)josch@hoothoot-chroot:/$ whoami
josch
(unstable-amd64-sbuild)josch@hoothoot-chroot:/$ hostname
hoothoot-chroot
(unstable-amd64-sbuild)josch@hoothoot-chroot:/$ ls -lha /proc   head
total 0
dr-xr-xr-x 218 nobody nogroup    0 Oct 25 19:06 .
drwxr-xr-x  22 root   root     440 Oct  1 08:42 ..
dr-xr-xr-x   9 root   root       0 Oct 25 19:06 1
dr-xr-xr-x   9 josch  josch      0 Oct 25 19:06 15
dr-xr-xr-x   9 josch  josch      0 Oct 25 19:06 16
dr-xr-xr-x   9 root   root       0 Oct 25 19:06 7
dr-xr-xr-x   9 josch  josch      0 Oct 25 19:06 8
dr-xr-xr-x   4 nobody nogroup    0 Oct 25 19:06 acpi
dr-xr-xr-x   6 nobody nogroup    0 Oct 25 19:06 asound
Of course instead of running this long command we can also instead write a small shell script and execute that instead. The following does the same things as the long command above but adds some comments for further explanation:
#!/bin/sh

set -exu

# I'm using /tmp because I have it mounted as a tmpfs
rootdir="/tmp/buildroot"

# bring the loopback interface up
ip link set lo up

# show that the loopback interface is really up
ip addr

# make use of the UTS namespace being unshared
hostname hoothoot-chroot

# extract the chroot tarball. This must be done inside the user namespace for
# the file permissions to be correct.
#
# tar will fail to call mknod and to change the permissions of /proc but we are
# ignoring that
tar -C "$rootdir" -xf /srv/chroot/unstable-amd64.tar.gz true

# run chroot and inside, immediately drop permissions to the user "josch" and
# start an interactive shell
/usr/sbin/chroot "$rootdir" /sbin/runuser -s /bin/bash - josch

# unmount /proc and remove the temporary directory
umount "$rootdir/proc"
rm -rf "$rootdir"
and then:
$ mkdir -p /tmp/buildroot/proc
$ ./user-unshare --mount-proc=/tmp/buildroot/proc --ipc --pid --net --uts --mount --fork -- ./chroot.sh
As mentioned in the beginning, the tool is nearly equivalent to calling lxc-usernsexec [opts] -- unshare [opts] -- COMMAND but because of the problem with mounting proc (mentioned earlier), lxc-usernsexec and unshare cannot be used with above example. If one tries anyways one will only get:
$ lxc-usernsexec -m b:0:1000:1 -m b:1:558752:1 -- unshare --mount-proc=/tmp/buildroot/proc --ipc --pid --net --uts --mount --fork -- ./chroot.sh
unshare: mount /tmp/buildroot/proc failed: Invalid argument
I'd be interested in finding out why that is and how to fix it.

18 October 2015

Lunar: Reproducible builds: week 25 in Stretch cycle

What happened in the reproducible builds effort this week: Toolchain fixes Niko Tyni wrote a new patch adding support for SOURCE_DATE_EPOCH in Pod::Man. This would complement or replace the previously implemented POD_MAN_DATE environment variable in a more generic way. Niko Tyni proposed a fix to prevent mtime variation in directories due to debhelper usage of cp --parents -p. Packages fixed The following 119 packages became reproducible due to changes in their build dependencies: aac-tactics, aafigure, apgdiff, bin-prot, boxbackup, calendar, camlmix, cconv, cdist, cl-asdf, cli-common, cluster-glue, cppo, cvs, esdl, ess, faucc, fauhdlc, fbcat, flex-old, freetennis, ftgl, gap, ghc, git-cola, globus-authz-callout-error, globus-authz, globus-callout, globus-common, globus-ftp-client, globus-ftp-control, globus-gass-cache, globus-gass-copy, globus-gass-transfer, globus-gram-client, globus-gram-job-manager-callout-error, globus-gram-protocol, globus-gridmap-callout-error, globus-gsi-callback, globus-gsi-cert-utils, globus-gsi-credential, globus-gsi-openssl-error, globus-gsi-proxy-core, globus-gsi-proxy-ssl, globus-gsi-sysconfig, globus-gss-assist, globus-gssapi-error, globus-gssapi-gsi, globus-net-manager, globus-openssl-module, globus-rsl, globus-scheduler-event-generator, globus-xio-gridftp-driver, globus-xio-gsi-driver, globus-xio, gnome-control-center, grml2usb, grub, guilt, hgview, htmlcxx, hwloc, imms, kde-l10n, keystone, kimwitu++, kimwitu-doc, kmod, krb5, laby, ledger, libcrypto++, libopendbx, libsyncml, libwps, lprng-doc, madwimax, maria, mediawiki-math, menhir, misery, monotone-viz, morse, mpfr4, obus, ocaml-csv, ocaml-reins, ocamldsort, ocp-indent, openscenegraph, opensp, optcomp, opus, otags, pa-bench, pa-ounit, pa-test, parmap, pcaputils, perl-cross-debian, prooftree, pyfits, pywavelets, pywbem, rpy, signify, siscone, swtchart, tipa, typerep, tyxml, unison2.32.52, unison2.40.102, unison, uuidm, variantslib, zipios++, zlibc, zope-maildrophost. The following packages became reproducible after getting fixed: Packages which could not be tested: Some uploads fixed some reproducibility issues but not all of them: Patches submitted which have not made their way to the archive yet: Lunar reported that test strings depend on default character encoding of the build system in ongl. reproducible.debian.net The 189 packages composing the Arch Linux core repository are now being tested. No packages are currently reproducible, but most of the time the difference is limited to metadata. This has already gained some interest in the Arch Linux community. An explicit log message is now visible when a build has been killed due to the 12 hours timeout. (h01ger) Remote build setup has been made more robust and self maintenance has been further improved. (h01ger) The minimum age for rescheduling of already tested amd64 packages has been lowered from 14 to 7 days, thanks to the increase of hardware resources sponsored by ProfitBricks last week. (h01ger) diffoscope development diffoscope version 37 has been released on October 15th. It adds support for two new file formats (CBFS images and Debian .dsc files). After proposing the required changes to TLSH, fuzzy hashes are now computed incrementally. This will avoid reading entire files in memory which caused problems for large packages. New tests have been added for the command-line interface. More character encoding issues have been fixed. Malformed md5sums will now be compared as binary files instead of making diffoscope crash amongst several other minor fixes. Version 38 was released two days later to fix the versioned dependency on python3-tlsh. strip-nondeterminism development strip-nondeterminism version 0.013-1 has been uploaded to the archive. It fixes an issue with nonconformant PNG files with trailing garbage reported by Roland Rosenfeld. disorderfs development disorderfs version 0.4.1-1 is a stop-gap release that will disable lock propagation, unless --share-locks=yes is specified, as it still is affected by unidentified issues. Documentation update Lunar has been busy creating a proper website for reproducible-builds.org that would be a common location for news, documentation, and tools for all free software projects working on reproducible builds. It's not yet ready to be published, but it's surely getting there. Homepage of the future reproducible-builds.org website  Who's involved?  page of the future reproducible-builds.org website Package reviews 103 reviews have been removed, 394 added and 29 updated this week. 72 FTBFS issues were reported by Chris West and Niko Tyni. New issues: random_order_in_static_libraries, random_order_in_md5sums.

16 October 2015

Norbert Preining: Debian/TeX Live multiarch update

A big update of all related packages (tex-common 6.04, texlive-bin 2015.20150524.37493-7, texlive-base/lang/extra package 2015.20151016-1) due to the move to support multi-arch. Of course, the regular updates of the TeX Live are included, too. With this change it should be possible to run a multi-arch system with only one TeX Live installed. Debian - TeX Live 2015 Thanks to the excellent support and testing of the Multi-arch guys, in particular Thorsten Glaser, Helmut Grohne, Johannes Schauer, and Wookey, I learned a lot about multi-arch, and I hope that the current setup is safe. All the packages but the various lib* packages are tagged as Multi-Arch: foreign, while the lib packages are tagged Multi-Arch: same. Anyway, if you find a bug concerning multi-arch, that is that some of the programs exhibit architecture information, please let us know via a bug report. Updated packages acro, alegreya, amiri, assoccnt, attachfile, babel-french, babel-hungarian, barr, beebe, biblatex-philosophy, bidi, bnumexpr, caption, chemfig, chemformula, chemmacros, cjk-gs-integrate, csplain, dantelogo, dataref, dtxgen, dvipdfmx-def, dvips, eledmac, elements, fcolumn, fithesis, fontspec, genealogytree, gradstudentresume, gtl, jfontmaps, knuth-local, koma-script, kotex-oblivoir, kotex-plain, kotex-utf, kpathsea, l3build, l3experimental, l3kernel, l3packages, latex, latexconfig, ledmac, ltxfileinfo, lualatex-math, luamplib, luatex, luatexbase, luatexja, luatexko, make4ht, mcf2graph, mflogo, modiagram, multiexpand, newtx, odsfile, old-arrows, paracol, pdfpages, pdftex, plain, pst-stru, pxchfon, randomwalk, reledmac, resumecls, rubik, selnolig, showhyphens, siunitx, suftesi, tetex, teubner, tex4ebook, tex4ht, texlive-scripts, tikzsymbols, tipfr, tools, tudscr, uassign, unicode-math, unravel, visualfaq, xepersian, xetex-def, xint. New packages archaeologie, ctablestack, dynamicnumber, exercises, fibeamer, h2020proposal, imfellenglish, lstbayes, tempora, xellipsis. Enjoy.

4 October 2015

Johannes Schauer: new sbuild release 0.66.0

I just released sbuild 0.66.0-1 into unstable. It fixes a whopping 30 bugs! Thus, I'd like to use this platform to: And a super big thank you to Roger Leigh who, despite having resigned from Debian, was always available to give extremely helpful hints, tips, opinion and guidance with respect to sbuild development. Thank you! Here is a list of the major changes since the last release:

3 August 2015

Lunar: Reproducible builds: week 14 in Stretch cycle

What happened in the reproducible builds effort this week: Toolchain fixes akira submitted a patch to make cdbs export SOURCE_DATE_EPOCH. She uploded a package with the enhancement to the experimental reproducible repository. Packages fixed The following 15 packages became reproducible due to changes in their build dependencies: dracut, editorconfig-core, elasticsearch, fish, libftdi1, liblouisxml, mk-configure, nanoc, octave-bim, octave-data-smoothing, octave-financial, octave-ga, octave-missing-functions, octave-secs1d, octave-splines, valgrind. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues but not all of them: In contrib, Dmitry Smirnov improved libdvd-pkg with 1.3.99-1-1. Patches submitted which have not made their way to the archive yet: reproducible.debian.net Four armhf build hosts were provided by Vagrant Cascadian and have been configured to be used by jenkins.debian.net. Work on including armhf builds in the reproducible.debian.net webpages has begun. So far the repository comparison page just shows us which armhf binary packages are currently missing in our repo. (h01ger) The scheduler has been changed to re-schedule more packages from stretch than sid, as the gcc5 transition has started This mostly affects build log age. (h01ger) A new depwait status has been introduced for packages which can't be built because of missing build dependencies. (Mattia Rizzolo) debbindiff development Finally, on August 31st, Lunar released debbindiff 27 containing a complete overhaul of the code for the comparison stage. The new architecture is more versatile and extensible while minimizing code duplication. libarchive is now used to handle cpio archives and iso9660 images through the newly packaged python-libarchive-c. This should also help support a couple other archive formats in the future. Symlinks and devices are now properly compared. Text files are compared as Unicode after being decoded, and encoding differences are reported. Support for Sqlite3 and Mono/.NET executables has been added. Thanks to Valentin Lorentz, the test suite should now run on more systems. A small defiency in unquashfs has been identified in the process. A long standing optimization is now performed on Debian package: based on the content of the md5sums control file, we skip comparing files with matching hashes. This makes debbindiff usable on packages with many files. Fuzzy-matching is now performed for files in the same container (like a tarball) to handle renames. Also, for Debian .changes, listed files are now compared without looking the embedded version number. This makes debbindiff a lot more useful when comparing different versions of the same package. Based on the rearchitecturing work has been done to allow parallel processing. The branch now seems to work most of the time. More test needs to be done before it can be merged. The current fuzzy-matching algorithm, ssdeep, has showed disappointing results. One important use case is being able to properly compare debug symbols. Their path is made using the Build ID. As this identifier is made with a checksum of the binary content, finding things like CPP macros is much easier when a diff of the debug symbols is available. Good news is that TLSH, another fuzzy-matching algorithm, has been tested with much better results. A package is waiting in NEW and the code is ready for it to become available. A follow-up release 28 was made on August 2nd fixing content label used for gzip2, bzip2 and xz files and an error on text files only differing in their encoding. It also contains a small code improvement on how comments on Difference object are handled. This is the last release name debbindiff. A new name has been chosen to better reflect that it is not a Debian specific tool. Stay tuned! Documentation update Valentin Lorentz updated the patch submission template to suggest to write the kind of issue in the bug subject. Small progress have been made on the Reproducible Builds HOWTO while preparing the related CCCamp15 talk. Package reviews 235 obsolete reviews have been removed, 47 added and 113 updated this week. 42 reports for packages failing to build from source have been made by Chris West (Faux). New issue added this week: haskell_devscripts_locale_substvars. Misc. Valentin Lorentz wrote a script to report packages tested as unreproducible installed on a system. We encourage everyone to run it on their systems and give feedback!

12 July 2015

Lunar: Reproducible builds: week 11 in Stretch cycle

Debian is undertaking a huge effort to develop a reproducible builds system. I'd like to thank you for that. This could be Debian's most important project, with how badly computer security has been going.

PerniciousPunk in Reddit's Ask me anything! to Neil McGovern, DPL. What happened in the reproducible builds effort this week: Toolchain fixes More tools are getting patched to use the value of the SOURCE_DATE_EPOCH environment variable as the current time:

In the reproducible experimental toolchain which have been uploaded: Johannes Schauer followed up on making sbuild build path deterministic with several ideas. Packages fixed The following 311 packages became reproducible due to changes in their build dependencies : 4ti2, alot, angband, appstream-glib, argvalidate, armada-backlight, ascii, ask, astroquery, atheist, aubio, autorevision, awesome-extra, bibtool, boot-info-script, bpython, brian, btrfs-tools, bugs-everywhere, capnproto, cbm, ccfits, cddlib, cflow, cfourcc, cgit, chaussette, checkbox-ng, cinnamon-settings-daemon, clfswm, clipper, compton, cppcheck, crmsh, cupt, cutechess, d-itg, dahdi-tools, dapl, darnwdl, dbusada, debian-security-support, debomatic, dime, dipy, dnsruby, doctrine, drmips, dsc-statistics, dune-common, dune-istl, dune-localfunctions, easytag, ent, epr-api, esajpip, eyed3, fastjet, fatresize, fflas-ffpack, flann, flex, flint, fltk1.3, fonts-dustin, fonts-play, fonts-uralic, freecontact, freedoom, gap-guava, gap-scscp, genometools, geogebra, git-reintegrate, git-remote-bzr, git-remote-hg, gitmagic, givaro, gnash, gocr, gorm.app, gprbuild, grapefruit, greed, gtkspellmm, gummiboot, gyp, heat-cfntools, herold, htp, httpfs2, i3status, imagetooth, imapcopy, imaprowl, irker, jansson, jmapviewer, jsdoc-toolkit, jwm, katarakt, khronos-opencl-man, khronos-opengl-man4, lastpass-cli, lava-coordinator, lava-tool, lavapdu, letterize, lhapdf, libam7xxx, libburn, libccrtp, libclaw, libcommoncpp2, libdaemon, libdbusmenu-qt, libdc0, libevhtp, libexosip2, libfreenect, libgwenhywfar, libhmsbeagle, libitpp, libldm, libmodbus, libmtp, libmwaw, libnfo, libpam-abl, libphysfs, libplayer, libqb, libsecret, libserial, libsidplayfp, libtime-y2038-perl, libxr, lift, linbox, linthesia, livestreamer, lizardfs, lmdb, log4c, logbook, lrslib, lvtk, m-tx, mailman-api, matroxset, miniupnpd, mknbi, monkeysign, mpi4py, mpmath, mpqc, mpris-remote, musicbrainzngs, network-manager, nifticlib, obfsproxy, ogre-1.9, opal, openchange, opensc, packaging-tutorial, padevchooser, pajeng, paprefs, pavumeter, pcl, pdmenu, pepper, perroquet, pgrouting, pixz, pngcheck, po4a, powerline, probabel, profitbricks-client, prosody, pstreams, pyacidobasic, pyepr, pymilter, pytest, python-amqp, python-apt, python-carrot, python-django, python-ethtool, python-mock, python-odf, python-pathtools, python-pskc, python-psutil, python-pypump, python-repoze.tm2, python-repoze.what, qdjango, qpid-proton, qsapecng, radare2, reclass, repsnapper, resource-agents, rgain, rttool, ruby-aggregate, ruby-albino, ruby-archive-tar-minitar, ruby-bcat, ruby-blankslate, ruby-coffee-script, ruby-colored, ruby-dbd-mysql, ruby-dbd-odbc, ruby-dbd-pg, ruby-dbd-sqlite3, ruby-dbi, ruby-dirty-memoize, ruby-encryptor, ruby-erubis, ruby-fast-xs, ruby-fusefs, ruby-gd, ruby-git, ruby-globalhotkeys, ruby-god, ruby-hike, ruby-hmac, ruby-integration, ruby-jnunemaker-matchy, ruby-memoize, ruby-merb-core, ruby-merb-haml, ruby-merb-helpers, ruby-metaid, ruby-mina, ruby-net-irc, ruby-net-netrc, ruby-odbc, ruby-ole, ruby-packet, ruby-parseconfig, ruby-platform, ruby-plist, ruby-popen4, ruby-rchardet, ruby-romkan, ruby-ronn, ruby-rubyforge, ruby-rubytorrent, ruby-samuel, ruby-shoulda-matchers, ruby-sourcify, ruby-test-spec, ruby-validatable, ruby-wirble, ruby-xml-simple, ruby-zoom, rumor, rurple-ng, ryu, sam2p, scikit-learn, serd, shellex, shorewall-doc, shunit2, simbody, simplejson, smcroute, soqt, sord, spacezero, spamassassin-heatu, spamprobe, sphinxcontrib-youtube, splitpatch, sratom, stompserver, syncevolution, tgt, ticgit, tinyproxy, tor, tox, transmissionrpc, tweeper, udpcast, units-filter, viennacl, visp, vite, vmfs-tools, waffle, waitress, wavtool-pl, webkit2pdf, wfmath, wit, wreport, x11proto-input, xbae, xdg-utils, xdotool, xsystem35, yapsy, yaz. Please note that some packages in the above list are falsely reproducible. In the experimental toolchain, debhelper exported TZ=UTC and this made packages capturing the current date (without the time) reproducible in the current test environment. The following packages became reproducible after getting fixed: Ben Hutchings upstreamed several patches to fix Linux reproducibility issues which were quickly merged. Some uploads fixed some reproducibility issues but not all of them: Uploads that should fix packages not in main: Patches submitted which have not made their way to the archive yet: reproducible.debian.net A new package set has been added for lua maintainers. (h01ger) tracker.debian.org now only shows reproducibility issues for unstable. Holger and Mattia worked on several bugfixes and enhancements: finished initial test setup for NetBSD, rewriting more shell scripts in Python, saving UDD requests, and more debbindiff development Reiner Herrmann fixed text comparison of files with different encoding. Documentation update Juan Picca added to the commands needed for a local test chroot installation of the locales-all package. Package reviews 286 obsolete reviews have been removed, 278 added and 243 updated this week. 43 new bugs for packages failing to build from sources have been filled by Chris West (Faux), Mattia Rizzolo, and h01ger. The following new issues have been added: timestamps_in_manpages_generated_by_ronn, timestamps_in_documentation_generated_by_org_mode, and timestamps_in_pdf_generated_by_matplotlib. Misc. Reiner Herrmann has submitted patches for OpenWrt. Chris Lamb cleaned up some code and removed cruft in the misc.git repository. Mattia Rizzolo updated the prebuilder script to match what is currently done on reproducible.debian.net.

22 June 2015

Lunar: Reproducible builds: week 8 in Stretch cycle

What happened about the reproducible builds effort this week: Toolchain fixes Andreas Henriksson has improved Johannes Schauer initial patch for pbuilder adding support for build profiles. Packages fixed The following 12 packages became reproducible due to changes in their build dependencies: collabtive, eric, file-rc, form-history-control, freehep-chartableconverter-plugin , jenkins-winstone, junit, librelaxng-datatype-java, libwildmagic, lightbeam, puppet-lint, tabble. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues but not all of them: Patches submitted which have not made their way to the archive yet: reproducible.debian.net Bugs with the ftbfs usertag are now visible on the bug graphs. This explain the recent spike. (h01ger) Andreas Beckmann suggested a way to test building packages using the funny paths that one can get when they contain the full Debian package version string. debbindiff development Lunar started an important refactoring introducing abstactions for containers and files in order to make file type identification more flexible, enabling fuzzy matching, and allowing parallel processing. Documentation update Ximin Luo detailed the proposal to standardize environment variables to pass a reference source date to tools that needs one (e.g. documentation generator). Package reviews 41 obsolete reviews have been removed, 168 added and 36 updated this week. Some more issues affecting packages failing to build from source have been identified. Meetings Minutes have been posted for Tuesday June 16th meeting. The next meeting is scheduled Tuesday June 23rd at 17:00 UTC. Presentations Lunar presented the project in French during Pas Sage en Seine in Paris. Video and slides are available.

4 February 2015

Johannes Schauer: I became a Debian Developer

Thanks to akira for the confetti to celebrate the occasion!

15 January 2015

Lunar: 80%

Unfortunately I could not go on stage at the 31st Chaos Communication Congress to present reproducible builds in Debian alongside Mike Perry from the Tor Project and Seth Schoen from the Electronic Frontier Foundation. I've tried to make it up for it, though and we have made amazing progress. Wiki reorganization What was a massive and frightening wiki page now looks really more welcoming: Screenshot of ReproducibleBuilds on Debian wiki Depending on what one is looking for, it should be much easier to find. There's now a high-level status overview given on the landing page, maintainers can learn how to make their packages reproducible, enthusiasts can more easily find what can help the project, and we have even started writing some history. .buildinfo for all packages New year's eve saw me hacking Perl to write dpkg-genbuildinfo. Similar to dpkg-genchanges, it's run by dpkg-buildpackage to produce .buildinfo control files. This is where the build environment, and hash of source and binary packages are recorded. This script, integrated with dpkg, replace the previous debhelper interim solution written by Niko Tyni. We used to fix mtimes in control.tar and data.tar using a specific addition to debhelper named dh_fixmtimes. To better support the ALWAYS_EXCLUDE environment variable and for pragramtic reasons, we moved the process in dh_builddeb. Both changes were quickly pushed to our continuous integration platform. Before, only packages using dh would create a .buildinfo and thus eventually be considered reproducible. With these modifications, many more packages had their chance and this shows: Growing amount of packages considered reproducible Yes, with our experimental toolchain we are now at more than eighty percent! That's more than 17200 source packages! srebuild Another big item on the todo-list was crossed over by Johannes Schauer. srebuild is a wrapper around sbuild:
Given a .buildinfo file, it first finds a timestamp of Debian Sid from snapshot.debian.org which contains the requested packages in their exact versions. It then runs sbuild with the right architecture as given by the .buildinfo file and the right base system to upgrade from, as given by the version of the base-files package version in the .buildinfo file. Using two hooks it will install the right package versions and verify that the installed packages are in the right version before the build starts.
Understanding problems Over 1700 packages have now been reviewed to understand why build results could not be reproduced on our experimental platform. The variations between the two builds are currently limited to time and file ordering, but this still has uncovered many problems. There are still toolchain fixes to be made (more than 180 packages for the PHP registry) which can make many packages reproducible at once, but others like C pre-processor macros will require many individual changes. debbindiff, the main tool used to understand differences, has gained support for .udeb, TrueType and OpenType fonts, PNG and PDF files. It's less likely to crash on problems with encoding or external tool. But most importantly for large package, it has been made a lot faster, thanks to Reiner Herrmann and Helmut Grohne. Helmut has also been able to spot cross-compilation issues by using debbindiff! Targeting our efforts It gives warm fuzzy feelings to hit the 80% mark, but it would be a bit irrelevant if this would not concern packages that matter. Thankfully, Holger worked on producing statistics for more specific package sets. Mattia Rizzolo has also done great work to improve the scripts generating the various pages visible on reproducible.debian.net. All essential and build-esential packages, except gcc and bash, are considered reproducible or have patches ready. After some lengthy builds, I also managed to come up with a patch to make linux build reproducibly. Miscellaneous After my initial attempt to modify r-base to remove a timestamp in R packages, Dirk Eddelbuettel discussed the issue with upstream and came up with a better patch. The latter has already been merged upstream! Dirk's solution is to allow timestamps to be set using an external environment variable. This is also how I modified FontForge to make it possible to reproduce fonts. Identifiers generated by xsltproc have also been an issue. After reviewing my initial patch, Andrew Awyer came up with a much nicer solution. Its potential performance implications need to be evaluated before submission, though. Chris West has been working on packages built with Maven amongst other things. PDF generated by GhostScript, another painful source of troubles, is being worked on by Peter De Wachter. Holger got X.509 certificates signed by the CA cartel for jenkins.debian.net and reproducible.debian.net. No more scary security messages now. Let's hope next year we will be able to get certificates through Let's Encrypt! Let's make a difference together As you can imagine with all that happened in the past weeks, the #debian-reproducible IRC channel has been a cool place to hang out. It's very energizing to get together and share contributions, exchange tips and discuss hardest points. Mandatory quote:
* h01ger is very happy to see again and again how this is a nice
         learning circle...! i've learned a whole lot here too... in
         just 3 months... and its going on...!
Reproducible builds are not going to change anything for most of our users. They simply don't care how they get software on their computer. But they care to get the right software without having to worry about it. That's our responsibility, as developers. Enabling users to trust their software is important and a major contribution, we as Debian, can make to the wider free software movement. Once Jessie is released, we should make a collective effort to make reproducible builds an highlight of our next release.

30 November 2014

Johannes Schauer: simple email setup

I was unable to find a good place that describes how to create a simple self-hosted email setup. The most surprising discovery was, how much already works after:
apt-get install postfix dovecot-imapd
Right after having finished the installation I was able to receive email (but only in in /var/mail in mbox format) and send email (bot not from any other host). So while I expected a pretty complex setup, it turned out to boil down to just adjusting some configuration parameters.

Postfix The two interesting files to configure postfix are /etc/postfix/main.cf and /etc/postfix/master.cf. A commented version of the former exists in /usr/share/postfix/main.cf.dist. Alternatively, there is the ~600k word strong man page postconf(5). The latter file is documented in master(5).

/etc/postfix/main.cf I changed the following in my main.cf
@@ -37,3 +37,9 @@
mailbox_size_limit = 0
recipient_delimiter = +
inet_interfaces = all
+
+home_mailbox = Mail/
+smtpd_recipient_restrictions = permit_mynetworks reject_unauth_destination permit_sasl_authenticated
+smtpd_sasl_type = dovecot
+smtpd_sasl_path = private/auth
+smtp_helo_name = my.reverse.dns.name.com
At this point, also make sure that the parameters smtpd_tls_cert_file and smtpd_tls_key_file point to the right certificate and private key file. So either change these values or replace the content of /etc/ssl/certs/ssl-cert-snakeoil.pem and /etc/ssl/private/ssl-cert-snakeoil.key. The home_mailbox parameter sets the default path for incoming mail. Since there is no leading slash, this puts mail into $HOME/Mail for each user. The trailing slash is important as it specifies qmail-style delivery'' which means maildir. The default of the smtpd_recipient_restrictions parameter is permit_mynetworks reject_unauth_destination so this just adds the permit_sasl_authenticated option. This is necessary to allow users to send email when they successfully verified their login through dovecot. The dovecot login verification is activated through the smtpd_sasl_type and smtpd_sasl_path parameters. I found it necessary to set the smtp_helo_name parameter to the reverse DNS of my server. This was necessary because many other email servers would only accept email from a server with a valid reverse DNS entry. My hosting provider charges USD 7.50 per month to change the default reverse DNS name, so the easy solution is, to instead just adjust the name announced in the SMTP helo.

/etc/postfix/master.cf The file master.cf is used to enable the submission service. The following diff just removes the comment character from the appropriate section.
@@ -13,12 +13,12 @@
#smtpd pass - - - - - smtpd
#dnsblog unix - - - - 0 dnsblog
#tlsproxy unix - - - - 0 tlsproxy
-#submission inet n - - - - smtpd
-# -o syslog_name=postfix/submission
-# -o smtpd_tls_security_level=encrypt
-# -o smtpd_sasl_auth_enable=yes
-# -o smtpd_client_restrictions=permit_sasl_authenticated,reject
-# -o milter_macro_daemon_name=ORIGINATING
+submission inet n - - - - smtpd
+ -o syslog_name=postfix/submission
+ -o smtpd_tls_security_level=encrypt
+ -o smtpd_sasl_auth_enable=yes
+ -o smtpd_client_restrictions=permit_sasl_authenticated,reject
+ -o milter_macro_daemon_name=ORIGINATING
#smtps inet n - - - - smtpd
# -o syslog_name=postfix/smtps
# -o smtpd_tls_wrappermode=yes

Dovecot Since above configuration changes made postfix store email in a different location and format than the default, dovecot has to be informed about these changes as well. This is done in /etc/dovecot/conf.d/10-mail.conf. The second configuration change enables postfix to authenticate users through dovecot in /etc/dovecot/conf.d/10-master.conf. For SSL one should look into /etc/dovecot/conf.d/10-ssl.conf and either adapt the parameters ssl_cert and ssl_key or store the correct certificate and private key in /etc/dovecot/dovecot.pem and /etc/dovecot/private/dovecot.pem, respectively. The dovecot-core package (which dovecot-imapd depends on) ships tons of documentation. The file /usr/share/doc/dovecot-core/dovecot/documentation.txt.gz gives an overview of what resources are available. The path /usr/share/doc/dovecot-core/dovecot/wiki contains a snapshot of the dovecot wiki at http://wiki2.dovecot.org/. The example configurations seem to be the same files as in /etc/ which are already well commented.

/etc/dovecot/conf.d/10-mail.conf The following diff changes the default email location in /var/mail to a maildir in ~/Mail as configured for postfix above.
@@ -27,7 +27,7 @@
#
# <doc/wiki/MailLocation.txt>
#
-mail_location = mbox:~/mail:INBOX=/var/mail/%u
+mail_location = maildir:~/Mail

# If you need to set multiple mailbox locations or want to change default
# namespace settings, you can do it by defining namespace sections.

/etc/dovecot/conf.d/10-master.conf And this enables the authentication socket for postfix:
@@ -93,9 +93,11 @@


# Postfix smtp-auth
- #unix_listener /var/spool/postfix/private/auth
- # mode = 0666
- #
+ unix_listener /var/spool/postfix/private/auth
+ mode = 0660
+ user = postfix
+ group = postfix
+

# Auth process is run as this user.
#user = $default_internal_user

Aliases Now Email will automatically put into the '~/Mail' directory of the receiver. So a user has to be created for whom one wants to receive mail...
$ adduser josch
...and any aliases for it to be configured in /etc/aliases.
@@ -1,2 +1,4 @@
-# See man 5 aliases for format
-postmaster: root
+root: josch
+postmaster: josch
+hostmaster: josch
+webmaster: josch
After editing /etc/aliases, the command
$ newaliases
has to be run. More can be read in the aliases(5) man page.

Finishing up Everything is done and now postfix and dovecot have to be informed about the changes. There are many ways to do that. Either restart the services, reboot or just do:
$ postfix reload
$ doveadm reload
Have fun!

7 November 2014

Johannes Schauer: automatically suspending cpu hungry applications

TLDR: Using the awesome window manager: how to automatically send SIGSTOP and SIGCONT to application windows when they get unfocused or focused, respectively, to let the application not waste CPU cycles when not in use. I don't require any fancy looking GUI, so my desktop runs no full-blown desktop environment like Gnome or KDE but instead only awesome as a light-weight window manager. Usually, the only application windows I have open are rxvt-unicode as my terminal emulator and firefox/iceweasel with the pentadactyl extension as my browser. Thus, I would expect that CPU usage of my idle system would be pretty much zero but instead firefox decides to constantly eat 10-15%. Probably to update some GIF animations or JavaScript (or nowadays even HTML5 video animations). But I don't need it to do that when I'm not currently looking at my browser window. Disabling all JavaScript is no option because some websites that I need for uni or work are just completely broken without JavaScript, so I have to enable it for those websites. Solution: send SIGSTOP when my firefox window looses focus and send SIGCONT once it gains focus again. The following addition to my /etc/xdg/awesome/rc.lua does the trick:
local capi =   timer = timer  
client.add_signal("focus", function(c)
if c.class == "Iceweasel" then
awful.util.spawn("kill -CONT " .. c.pid)
end
end)
client.add_signal("unfocus", function(c)
if c.class == "Iceweasel" then
local timer_stop = capi.timer timeout = 10
local send_sigstop = function ()
timer_stop:stop()
if client.focus.pid ~= c.pid then
awful.util.spawn("kill -STOP " .. c.pid)
end
end
timer_stop:add_signal("timeout", send_sigstop)
timer_stop:start()
end
end)
Since I'm running Debian, the class is "Iceweasel" and not "Firefox". When the window gains focus, a SIGCONT is sent immediately. I'm executing kill because I don't know how to send UNIX signals from lua directly. When the window looses focus, then the SIGSTOP signal is only sent after a 10 second timeout. This is done for several reasons: With this change, when I now open htop, the process consuming most CPU resources is htop itself. Success! Another cool advantage is, that firefox can now be moved completely into swap space in case I run otherwise memory hungry applications without ever requiring any memory from swap until I really use it again. I haven't encountered any disadvantages of this setup yet. If 10 seconds prove to be too short to copy and paste I can easily extend this delay. Even clicking on links in my terminal works flawlessly - the new tab will just only load once firefox gets focused again. EDIT: thanks to Helmut Grohne for suggesting to compare the pid instead of the raw client instance to prevent misbehaviour when firefox opens additional windows like the preferences dialog.

29 July 2014

Johannes Schauer: bootstrap.debian.net temporarily not updated

I'll be moving places twice within the next month and as I'm hosting the machine that generates the data, I'll temporarily suspend the bootstrap.debian.net service until maybe around September. Until then, bootstrap.debian.net will not be updated and retain the status as of 2014-07-28. Sorry if that causes any inconvenience. You can write to me if you need help with manually generating the data bootstrap.debian.net provided.

5 June 2014

Johannes Schauer: botch updates

My last update about ongoing development of botch, the bootstrap/build ordering tool chain, was four months ago and about several incremental updates. This post will be of similar nature. The most interesting news is probably the additional data that bootstrap.debian.net now provides. This is listed in the next section. All subsequent sections then list the changes under the hood that made the additions to bootstrap.debian.net possible.

bootstrap.debian.net The bootstrap.debian.net service used to have botch as a git submodule but now runs botch from its Debian package. This at least proves that the botch Debian package is mature enough to do useful stuff with it. In addition to the bootstrapping results by architecture, bootstrap.debian.net now also hosts the following additional services: Further improvements concern how dependency cycles are now presented in the html overviews. While before, vertices in a cycle where separated by commas as if they were simple package lists, vertices are now connected by unicode arrows. Dashed arrows indicate build dependencies while solid arrows indicate builds-from relationships. For what it's worth, installation set vertices now contain their installation set in their title attribute.

Debian package Botch has long depended on features of an unreleased version of dose3 which in turn depended on an unrelease version of libcudf. Both projects have recently made new releases so that I was now able to drop the dose3 git submodule and rely on the host system's dose3 version instead. This also made it possible to create a Debian package of botch which currently sits at Debian mentors. Writing the package also finally made me create a usable install target in the Makefile as well as adding stubs for the manpages of the 44 applications that botch currently ships. The actual content of these manpages still has to be written. The only documentation botch currently ships in the botch-doc package is an offline version of the wiki on gitorious. The new page ExamplesGraphs even includes pictures.

Cross By default, botch analyzes the native bootstrapping phase. That is, assume that the initial set of Essential:yes and build-essential packages magically exists and find out how to bootstrap the rest from there through native compilation. But part of the bootstrapping problem is also to create the set of Essential:yes and build-essential packages from nothing via cross compilation. Botch is unable to analyze the cross phase because too many packages cannot satisfy their crossbuild dependencies due to multiarch conflicts. This problem is only about the dependency metadata and not about whether a given source package actually crosscompiles fine in practice. Helmut Grohne has done great work with rebootstrap which is regularly run by jenkins.debian.net. He convinced me that we need an overview of what packages are blocking the analysis of the cross case and that it was useful to have a crossbuild order even if that was a fake order just to have a rough overview of the current situation in Debian Sid. I wrote a couple of scripts which would run dose-builddebcheck on a repository, analyze which packages fail to satisfy their crossbuild dependencies and why, fix those cases by adjusting package metadata accordingly and repeat until all relevant source packages satisfy their crossbuild dependencies. The result of this can then be used to identify the packages that need to be modified as well as to generate a crossbuild order. The fixes to the metadata are done in an automatic fashion and do not necessarily reflect the real fix that would solve the problem. Nevertheless, I ended up agreeing that it is better to have a slightly wrong overview than no overview at all.

Minimizing the dependency graph size Installation sets in the dependency graph are calculated independent from each other. If two binary packages provide A, then dependencies on A in different installation sets might choose different binary packages as providers of A. The same holds for disjunctive dependencies. If a package depends on A C and another package depends on C A then there is no coordination to choose C so to minimize the overall amount of vertices in the graph. I implemented two methods to minimize the impact of cases where the dependency solver has multiple options to satisfy a dependency through Provides and dependency disjunctions. The first method is inspired by Helmut Grohne. An algorithm goes through all disjunctive binary dependencies and removes all virtual packages, leaving only real packages. Of the remaining real packages, the first one is selected. For build dependencies, the algorithm drops all but the first package in every disjunction. This is also what sbuild does. Unfortunately this solution produces an unsatisfiable dependency situation in most cases. This is because oftentimes it is necessary to select the virtual disjunctive dependency because of a conflict relationship introduced by another package. The second method involves aspcud, a cudf solver which can optimize a solution by a criteria. This solution is based on an idea by Pietro Abate who implemented the basis for this idea back in 2012. In contrast to a usual cudf problem, binary packages now also depend on the source packages they build from. If we now ask aspcud to find an installation set for one of the base source packages (I chose src:build-essential) then it will return an installation set that includes source packages. As an optimization criteria the number of source packages in the installation set is minimized. This solution would be flawless if there were no conflicts between binary packages. Due to conflicts not all binary packages that must be coinstallable for this strategy to work can be coinstalled. The quick and dirty solution is to remove all conflicts before passing the cudf universe to aspcud. But this also means that the solution does sometimes not work in practice.

Test cases Botch now finally has a test target in its Makefile. The test target tests two code paths of the native.sh script and the cross.sh script. Running these two scripts covers testing most parts of botch. Given that I did lots of refactoring in the past weeks, the test cases greatly helped to assure that I didnt break anything in the process. I also added autopkgtests to the Debian packaging which test the same things as the test target but naturally run the installed version of botch instead. The autopkgtests were a great help in weeding out some lasts bugs which made botch depend on being executed from its source directory.

Python 3 Reading the suggestions in the Debian python policy I evaluated the possibility to use Python 3 for the Python scripts in botch. While I was at it I added transparent decompression for gzip, bz2 and xz based on the file magic, replaced python-apt with python-debian because of bug#748922 and added argparse argument parsing to all scripts. Unfortunately I had to find out that Python 3 support does not yet seem to be possible for botch for the following reasons:
  • no soap module for Python 3 in Debian (needed for bts access)
  • hash randomization is turned on by default in Python 3 and therefore the graph output of networkx is not deterministic anymore (bug#749710)
Thus I settled for changing the code such that it would be compatible with Python 2 as well as with Python 3. Because of the changed string handling and sys.stdout properties in Python 3 this proved to be tricky. On the other hand this showed me bugs in my code where I was wrongly relying on deterministic dictionary key traversal.

3 April 2014

Johannes Schauer: mapbender - maps for long-distance travels

Back in 2007 I stumbled over the "Plus Fours Routefinder", an invention of the 1920's. It's worn on the wrist and allows the user to scroll through a map of the route they planned to take, rolled up on little wooden rollers. At that point I thought: that's awesome for long trips where you either dont want to take electronics with you or where you are without any electricity for a long time. And creating such rollable custom maps of your route automatically using openstreetmap data should be a breeze! Nevertheless it seems nobody picked up the idea. Years passed and in a few weeks I'll go on a biking trip along the Weser, a river in nothern Germany. For my last multi-day trip (which was through the Odenwald, an area in southern Germany) I printed a big map from openstreetmap data which contained the whole route. Openstreetmap data is fantastic for this because in contrast to commercial maps it doesnt only allow you to just print the area you need but also allows you to highlight your planned route and objects you would probably not find in most commercial maps like for example supermarkets to stock up on supplies or bicycle repair shops. Unfortunately such big maps have the disadvantage that to show everything in the amount of detail that you want along your route, they have to be pretty huge and thus easily become an inconvenience because the local plotter can't handle paper as large as DIN A0 or because it's a pain to repeatedly fold and unfold the whole thing every time you want to look at it. Strong winds are also no fun with a huge sheet of paper in your hands. One solution would be to print DIN A4 sized map regions in the desired scale. But that has the disadvantage that either you find yourself going back and forth between subsequent pages because you happen to be right at the border between two pages or you have to print sufficiently large overlaps, resulting in many duplicate map pieces and more pages of paper than you would like to carry with you. It was then that I remembered the "Plus Fours Routefinder" concept. Given a predefined route it only shows what's important to you: all things close to the route you plan to travel along. Since it's a long continuous roll of paper there is no problem with folding because as you travel along the route you unroll one end and roll up the other. And because it's a long continuous map there is also no need for flipping pages or large overlap regions. There is not even the problem of not finding a big enough sheet of paper because multiple DIN A4 sheets can easily be glued together at their ends to form a long roll. On the left you see the route we want to take: the bicycle route along the Weser river. If I wanted to print that map on a scale that allows me to see objects in sufficient detail along our route, then I would also see objects in Hamburg (upper right corner) in the same amount of detail. Clearly a waste of ink and paper as the route is never even close to Hamburg.
As the first step, a smooth approximation of the route has to be found. It seems that the best way to do that is to calculate a B-Spline curve approximating the input data with a given smoothness. On the right you can see the approximated curve with a smoothing value of 6. The curve is sampled into 20 linear segments. I calculated the B-Spline using the FITPACK library to which scipy offers a Python binding.
The next step is to expand each of the line segments into quadrilaterals. The distance between the vertices of the quadrilaterals and the ends of the line segment they belong to is the same along the whole path and obviously has to be big enough such that every point along the route falls into one quadrilateral. In this example, I draw only 20 quadrilaterals for visual clarity. In practice one wants many more for a smoother approximation.
Using a simple transform, each point of the original map and the original path in each quadrilateral is then mapped to a point inside the corresponding "straight" rectangle. Each target rectangle has the height of the line segment it corresponds to. It can be seen that while the large scale curvature of the path is lost in the result, fine details remain perfectly visible. The assumption here is, that while travelling a path several hundred kilometers long, it does not matter that large scale curvature that one is not able to perceive anyways is not preserved. The transformation is done on a Mercator projection of the map itself as well as the data of the path. Therefore, this method probably doesnt work if you plan to travel to one of the poles. Currently I transform openstreetmap bitmap data. This is not quite optimal as it leads to text on the map being distorted. It would be just as easy to apply the necessary transformations to raw openstreetmap XML data but unfortunately I didnt find a way to render the resulting transformed map data as a raster image without setting up a database. I would've thought that it would be possible to have a standalone program reading openstreetmap XML and dumping out raster or svg images without a round trip through a database. Furthermore, tilemill, one of the programs that seem to be one of the least hasslesome to set up and produce raster images is stuck in an ITP and the existing packaging attempt fails to produce a non-empty binary package. Since I have no clue about nodejs packaging, I wrote about this to the pkg-javascript-devel list. Maybe I can find a kind soul to help me with it.
Here a side by side overview that doesnt include the underlying map data but only the path. It shows how small details are preserved in the transformation. The code that produced the images in this post is very crude, unoptimized and kinda messy. If you dont care, then it can be accessed here

6 February 2014

Johannes Schauer: botch updates

My last update about ongoing development of botch, the bootstrap/build ordering tool chain, was three months ago with the announcement of bootstrap.debian.net. Since then a number of things happened, so I thought an update was due.

New graphs for port metrics By default, a dependency graph is created by arbitrarily choosing an installation set for binary package installation or source package compilation. Installation set vertices and source vertices are connected according to this arbitrary selection. Niels Thykier approached me at Debconf13 about the possibility of using this graph to create a metric which would be able to tell for each source package, how many other source packages would become uncompilable or how many binary packages would become uninstallable, if that source package was removed from the archive. This could help deciding about the importance of packages. More about this can be found at the thread on debian-devel. For botch, this meant that two new graph graphs can now be generated. Instead of picking an arbitrary installation set for compiling a source package or installing a binary package, botch can now create a minimum graph which is created by letting dose3 calculate strong dependencies and a maximum graph by using the dependency closure.

Build profile syntax in dpkg With dpkg 1.17.2 we now have experimental build profile support in unstable. The syntax which ended up being added was:
Build-Depends: large (>= 1.0), small <!profile.stage1>
But until packages with that syntax can hit the archive, a few more tools need to understand the syntax. The patch we have for sbuild is very simple because sbuild relies on libdpkg for dependency parsing. We have a patch for apt too, but we have to rebase it for the current apt version and have to adapt it so that it works exactly like the functionality dpkg implements. But before we can do that we have to decide how to handle mixed positive and negative qualifiers or whether to remove this feature altogether because it causes too much confusion. The relevant thread on debian-dpkg starts here.

Update to latest dose3 git Botch heavily depends on libdose3 and unfortunately requires features which are only available in the current git HEAD. The latest version packaged in Debian is 3.1.3 from October 2012. Unfortunately the current dose3 git HEAD also relies on unreleased features from libcudf. On top of that, the GraphML output of the latest ocamlgraph version (1.8.3) is also broken and only fixed in the git. For now everything is set up as git submodules but this is the major blocker preventing any packaging of botch as a Debian package. Hopefully new releases will be done soon for all involved components.

Writing and reading GraphML Botch is a collection of several utilities which are connected together in a shell script. The advantage of this is, that one does not need to understand or hack the OCaml code to use botch for different purposes. In theory it also allows to insert 3rd party tools into a pipe to further modify the data. Until recently this ability was seriously hampered by the fact that many botch tools communicated with each other through marshaled OCaml binary files which prevent everything which is not written in OCaml from modifying them. The data that was passed around like this were the dependency graphs and I initially implemented it like that because I didnt want to write a GraphML parser. I now ended up writing an xmlm based GraphML parser so as of now, botch only reads and writes ASCII text files in XML (for the graphs) and in rfc822 packages format (for Packages and Sources files) which can both easily be modified by 3rd party tools. The ./tools directory contains many Python scripts using the networkx module to modify GraphML and the apt_pkg module to modify rfc822 files.

Splitting of tools To further increase the ability to modify program execution without having to know OCaml, I split up some big tools into multiple smaller ones. Some of the smaller tools are now even written in Python which is probably much more hackable for the general crowd. I converted those tools to Python which did not need any dose3 functionality and which were simple enough so that writing them didnt take much time. I could convert more tools but that might introduce bugs and takes time which I currently dont have much of (who does?).

Gzip instead of bz2 Since around January 14, snapshot.debian.org doesnt offer bzip2 compressed Packages and Sources files anymore but uses xz instead. This is awesome for must purposes but unfortunately I had to discover that there exist no OCaml bindings for libxz. Thus, botch is now using gzip instead of bz2 until either myself or anybody else finds some time to write a libxz OCaml binding.

Self hosting Fedora Paul Wise made me aware of Harald Hoyer's attempts to bootstrap Fedora. I reproduced his steps for the Debian dependency graph and it turns out that they are a little bit bigger. I'm exchanging emails with Harald Hoyer because it might not be too hard to use botch for rpm based distributions as well because dose3 supports rpm. The article also made me aware of the tred tool which is part of graphviz and allows to calculate the transitive reduction of a graph. This can help making horrible situations much better.

Dose3 bugs I planned to generate such simplified graphs for the neighborhood of each source package on bootstrap.debian.net but then binutils stopped building binutils-gold and instead provided binutils-gold while libc6-dev breaks binutils-gold (<< 2.20.1-11). This unfortunately triggered a dose3 bug and thus bootstrap.debian.net will not generate any new results until this is fixed in dose3. Another dose3 bug affects packages which Conflicts/Replaces/Provides:bar while bar is fully virtual and are Multi-Arch:same. Binaries of different architecture with this property can currently not be co-installed with dose3. Unfortunately linux-libc-dev has this property and thus botch cannot be used to analyze cross builds until that bug is fixed in dose3. I hope I get some free time soon to be able to look at these dose3 issues myself.

More documentation Since I started to like the current set of tools and how they work together I ended up writing over 2600 words of documentation in the past few days. You can start setting up and running botch by reading the first steps and get more detailed information by reading about the 28 tools that botch makes use of as of now. All existing articles, thesis and talks are linked from the wiki home.

11 January 2014

Johannes Schauer: Why do I need superuser privileges when I just want to write to a regular file

I have written a number of scripts to create Debian foreign architecture (mostly armel and armhf) rootfs images for SD cards or NAND flashing. I started with putting Debian on my Openmoko gta01 and gta02 and continued with devices like the qi nanonote, a marvel kirkwood based device, the Always Innovating Touchbook (close to the Beagleboard), the Notion Ink Adam and most recently the Golden Delicious gta04. Once it has been manufactured, I will surely also get my hands dirty with the Neo900 whose creators are currently looking for potential donors/customers to increase the size of the first batch and get the price per unit further down. Creating a Debian rootfs disk image for all these devices basically follows the same steps:
  1. create an disk image file, partition it, format the partitions and mount the / partition into a directory
  2. use debootstrap or multistrap to extract a selection of armel or armhf packages into the directory
  3. copy over /usr/bin/qemu-arm-static for qemu user mode emulation
  4. chroot into the directory to execute package maintainer scripts with dpkg --configure -a
  5. copy the disk image onto the sd card
It was not long until I started wondering why I had to run all of the above steps with superuser privileges even though everything except the final step (which I will not cover here) was in principle nothing else than writing some magic bytes to files I had write access to (the disk image file) in some more or less fancy ways. So I tried using fakeroot+fakechroot and after some initial troubles I managed to build a foreign architecture rootfs without needing root priviliges for steps two, three and four. I wrote about my solution which still included some workarounds in another article here. These workarounds were soon not needed anymore as upstream fixed the outstanding issues. As a result I wrote the polystrap tool which combines multistrap, fakeroot, fakechroot and qemu user mode emulation. Recently I managed to integrate proot support in a separate branch of polystrap. Last year I got the LEGO ev3 robot for christmas and since it runs Linux I also wanted to put Debian on it by following the instructions given by the ev3dev project. Even though ev3dev calls itself a "distribution" it only deviates from pure Debian by its kernel, some configuration options and its initial package selection. Otherwise it's vanilla Debian. The project also supplies some multistrap based scripts which create the rootfs and then partition and populate an SD card. All of this is of course done as the superuser. While the creation of the file/directory structure of the foreign Debian armel rootfs can by now easily be done without superuser priviliges by running multistrap under fakeroot/fakechroot/proot, creating the SD card image still seems to be a bit more tricky. While it is no problem to write a partition table to a regular file, it turned out to be tricky to mount these partition because tools like kpartx and losetup require superuser permissions. Tools like mkfs.ext3 and fuse-ext2 which otherwise would be able to work on a regular file without superuser privileges do not seem to allow to specify the required offsets that the partitions have within the disk image. With fuseloop there exists a tool which allows to "loop-mount" parts of a file in userspace to a new file and thus allows tools like mkfs.ext3 and fuse-ext2 to work as they normally do. But fuseloop is not packaged for Debian yet and thus also not in the current Debian stable. An obvious workaround would be to create and fill each partition in a separate file and concatenate them together. But why do I have to write my data twice just because I do not want to become the superuser? Even worse: because parted refuses to write a partition table to a file which is too small to hold the specified partitions, one spends twice the disk space of the final image: the image with the partition table plus the image with the main partition's content. So lets summarize: a bootable foreign architecture SD card disk image is nothing else than a regular file representing the contents of the SD card as a block device. This disk image is created in my home directory and given enough free disk space there is nothing stopping me from writing any possible permutation of bits to that file. Obviously I'm interested in a permutation representing a valid partition table and file systems with sensible content. Why do I need superuser privileges to generate such a sensible permutation of bits? Gladly it seems that the (at least in my opinion) hardest part of faking chroot and executing foreign architecture package maintainer scripts is already possible without superuser privileges by using fakeroot and fakechroot or proot together with qemu user mode emulation. But then there is still the blocker of creating the disk image itself through some user mode loop mounting of a filesystem occupying a virtual "partition" in the disk image. Why has all this only become available so very recently and still requires a number of workarounds to fully work in userspace? There exists a surprising amount of scripts which wrap debootstrap/multistrap. Most of them require superuser privileges. Does everybody just accept that they have to put a sudo in front of every invocation and hope for the best? While this might be okay for well tested code like debootstrap and multistrap the countless wrapper scripts might accidentally (be it a bug in the code or a typo in the given command line arguments) write to your primary hard disk instead of your SD card. Such behavior can easily be mitigated by not executing any such script with superuser privileges in the first place. Operations like loop mounting affect the whole system. Why do I have to touch anything outside of my home directory (/dev/loop in this case) to populate a file in it with some meaningful bits? Virtualization is no option because every virtualization solution again requires root privileges. One might argue that a number of solutions just require some initial setup by root to then later be used by a regular user (for example /etc/fstab configuration or the schroot approach). But then again: why do I have to write anything outside of my home directory (even if it is only once) to be able to write something meaningful to a file in it? The latter approach also does not work if one cannot become root in the first place or is limited by a virtualized environment. Imagine you are trying to build a Debian rootfs on a machine where you just have a regular user account. Or a situation I was recently in: I had a virtual server which denied me operations like loop mounting. Given all these downsides, why is it still so common to just assume that one is able and willing to use sudo and be done with it in most cases? I really wonder why technologies like fakeroot and fakechroot have only been developed this late. Has this problem not been around since the earliest days of Linux/Unix? Am I missing something and rambling around for nothing? Is this idea a lost cause or something that is worth spending time and energy on to extend and fix the required tools?

5 November 2013

Johannes Schauer: announcing bootstrap.debian.net

The following post is a verbatim copy of my message to the debian-devel list. While botch produces loads of valuable data to help maintainers modifying the right source packages with build profiles and thus make Debian bootstrappable, it has so far failed at producing this data in a format which is: While human readability is probably still lacking (it's hard to write in a manner understandable by everybody about a complicated topic you are very familiar with), the bootstrapping results are now generated automatically (on a daily basis) and published in a per-source-package-basis as well. Thus let me introduce to you: bootstrap.debian.net Paul Wise encouraged me to set this up and also donated the debian.net CNAME to my server. Thanks a lot! The data is generated daily from the midnight packages/sources records of snapshot.debian.org (I hope it's okay to grab the data from there on a daily basis). The resulting data can be viewed in HTML format (with some javascript for to allow table sorting and paging in case you use javascript) per architecture (here for amd64). In addition it also produces HTML pages per source package for all source packages which are involved in a dependency cycle in any architecture (for example src:avahi or src:python2.7). Currently there are 518 source packages involved in a dependency cycle. While this number seems high, remember that estimations by calculating a feedback arc set suggest that only 50-60 of these source packages have to be modified with build profiles to make the whole graph acyclic. For now it is funny to see that the main architectures do not bootstrap (since July, for different reasons) while less popular ones like ia64 and s390x bootstrap right now without problems (at least in theory). According to the logs (also accessible at above link, here for amd64) this is because gcc-4.6 currently fails compiling due to a build-conflict (this has been reported in bug#724865). Once this bug is fixed, all arches should be bootstrappable again. Let me remind you here that the whole analysis is done on the dependency relationships only (not a single source package is actually compiled at any point) and compilation might fail for many other reasons in practice. It has been the idea of Paul Wise to integrate this data into the pts so that maintainers of affected source packages can react to the heuristics suggested by botch. For this purpose, the website also publishes the raw JSON data from which the HTML pages are generated (here for amd64). The bugreport against the bts can be found in bug#728298. I'm sure that a couple of things regarding understandability of the results are not yet sufficiently explained or outright missing. If you see any such instance, please drop me a mail, suggesting what to change in the textual description or presentation of the results. I also created the following two wiki pages to give an overview of the utilized terminology: Feel also free to tell me if anything in these pages is unclear. Direct patches against the python code producing the HTML from the raw JSON data are also always welcome.

11 July 2013

Johannes Schauer: Using botch to generate transition build orders

In October 2012, Joachim Breitner asked me whether botch (well, it didnt have a name back then) can also be used to calculate a build order for recompiling ~450 haskell-packages with a new ghc version (it was probably the 7.6.1 release) to upload them to experimental. What is still blocking this ability is the inability of botch to directly read *.dsc files instead of having to rely on Packages and Sources files. On the other hand (in case there exists a set of Packages and Sources files) it is now much easier to use botch for such a use case for which it was not originally designed. To demonstrate how botch can be used to calculate the build order for library transitions, I wrote the script create-transition-order.sh which executes the individual tools in the correct order with the correct arguments. To validate the correctness of botch, I compared the output to the order which ben produces for transitions. The create-transition-order.sh script is called with a ben query string and optionally a snapshot.debian.org timestamp as its arguments. The script relies on ben being installed because ben query strings cannot be translated into grep-dctrl query strings as ben query splits fields which contain comma separated values at the comma before searching for strings. Unfortunately the script also currently relies on a yet unreleased ben feature which allows ben download to create a ben.cache file. You can track the progress of this feature as bug#714703. Creating a ben.cache file is necessary because some queries rely on an association between binary and source packages which is not present in all packages in a Packages file. For example the haskell transition query makes use of this feature by including the .source field in its query. The result of these trials was, that botch produced nearly the same build order for nearly all transitions. The only differences are due to shortcomings of ben and botch. For example, ben is not able to create an order for transitions where involved packages form one or more cycles. A prominent example is the haskell transition where ghc itself is only built during step 15 after many other packages which would have needed ghc to be compiled. Botch solves this problem by reducing all strongly connected components in a cyclic graph to a single vertex before creating the order. This operation makes the graph acyclic and creating a build order trivial. The only remaining problems can then be found within the strongly connected components (for Haskell they are of maximum size two) but the overall build order is correct. On the other hand botch has no notion of packages which are affected by a transition and thus creates a build order which in some cases is longer than the one created by ben. When generating the interdependencies between packages, ben only considers those which are part of the transition. Botch on the other hand considers all dependency relationships. It would be simple to solve this issue in botch by removing unaffected packages from the dependency graph through edge contraction (an operation already used by botch for other tasks). This exercise also let me find another bug in dose3 where libdose would not associate a binary package with a binNMU version with its associated source package but instead report a version mismatch error. This problem was also reported in the process.

18 June 2013

Pietro Abate: Bootstrapping Software Distributions

The paper Bootstrapping Software Distributions , co-authored with Johannes Schauer has been accepted for publication in the proceedings of CBSE 2013, Vancouver, Canada, June 17-21, 2013.

AbstractNew hardware architectures and custom co-processor extensions are introduced to the market on a regular basis. While it is relatively easy to port a proprietary software stack to a new platform, FOSS distributions face major challenges. Bootstrapping distributions proved to be a yearlong manual process in the past due to a large amount of dependency cycles which had to be broken by hand. In this paper we propose an heuristic-based algorithm to remove build dependency cycles and to create a build order for automatically bootstrapping a binary based software distribution on a new platform.

Next.

Previous.